AITopics | alpha vector

Collaborating Authors

alpha vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix A for AdaOPS

Neural Information Processing SystemsAug-18-2025, 16:51:32 GMT

According to Alg. 2, in each exploration, at least one leaf node will be expanded. Thus, we have the conclusion that AdaOPS is guaranteed to terminate. First, we will demonstrate that the value of any belief can be formulated as an integral. This lemma is a concentration inequality of self-normalized importance sampling estimator. The ESS threshold µ for adaptive resampling is set to .

adaops, artificial intelligence, max 1, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.50)

Add feedback

Entropy-regularized Point-based Value Iteration

Delecki, Harrison, Vazquez-Chanlatte, Marcell, Yel, Esen, Wray, Kyle, Arnon, Tomer, Witwicki, Stefan, Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceFeb-14-2024

Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.

agent, alpha vector, pomdp, (12 more...)

arXiv.org Artificial Intelligence

2402.09388

Country:

North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > California > Santa Clara County > Santa Clara (0.05)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.79)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)

Add feedback

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

Zhou, Runlong, Wang, Ruosong, Du, Simon S.

arXiv.org Artificial IntelligenceMay-21-2023

We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver. We prove an $\tilde{O}(\sqrt{\mathsf{Var}^\star M \Gamma S A K})$ regret bound where $\tilde{O}$ hides logarithm factors, $M$ is the number of contexts, $S$ is the number of states, $A$ is the number of actions, $K$ is the number of episodes, $\Gamma \le S$ is the maximum transition degree of any state-action pair, and $\mathsf{Var}^\star$ is a variance quantity describing the determinism of the LMDP. The regret bound only scales logarithmically with the planning horizon, thus yielding the first (nearly) horizon-free regret bound for LMDP. This is also the first problem-dependent regret bound for LMDP. Key in our proof is an analysis of the total variance of alpha vectors (a generalization of value functions), which is handled with a truncation method. We complement our positive result with a novel $\Omega(\sqrt{\mathsf{Var}^\star M S A K})$ regret lower bound with $\Gamma = 2$, which shows our upper bound minimax optimal when $\Gamma$ is a constant for the class of variance-bounded LMDPs. Our lower bound relies on new constructions of hard instances and an argument inspired by the symmetrization technique from theoretical computer science, both of which are technically different from existing lower bound proof for MDPs, and thus can be of independent interest.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2210.11604

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Technical Report: The Policy Graph Improvement Algorithm

Pajarinen, Joni

arXiv.org Artificial IntelligenceSep-4-2020

Optimizing a partially observable Markov decision process (POMDP) policy is challenging. The policy graph improvement (PGI) algorithm for POMDPs represents the policy as a fixed size policy graph and improves the policy monotonically. Due to the fixed policy size, computation time for each improvement iteration is known in advance. Moreover, the method allows for compact understandable policies. This report describes the technical details of the PGI [1] and particle based PGI [2] algorithms for POMDPs in a more accessible way than [1] or [2] allowing practitioners and students to understand and implement the algorithms.

artificial intelligence, machine learning, policy graph, (15 more...)

arXiv.org Artificial Intelligence

2009.02164

Country: